Search CORE

19 research outputs found

Brook GLES Pi: democratising accelerator programming

Author: Bakhoda Ali
Bellard Fabrice
Leskela Jyrki
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/08/2018
Field of study

Nowadays computing is heavily-based on accelerators, however, the cost of the hardware equipment prevents equal access to heterogeneous programming. In this work we present Brook GLES Pi, a port of the accelerator programming language Brook. Our solution, primarily focused on the educational platform Raspberry Pi, allows to teach, experiment and take advantage of heterogeneous programming on any low-cost embedded device featuring an OpenGL ES 2 GPU, democratising access to accelerator programming.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Monoclinic modification of 1,2-bis(diphenylselenophosphinoyl)ethane

Author: Ali Nemati Kharat
Alireza Abbasi
Ghasem Bakhoda
Lobana
Maryam Ahmadian
Risto
Sheldrick
Spek
Publication venue: International Union of Crystallography
Publication date: 01/11/2008
Field of study

The complete molecule of the title compound, C26H24P2Se2, is generated by crystallographic 2-fold symmetry, with the rotation axis bisecting the central C—C bond. The dihedral angle between the terminal aromatic rings is 74.1 (1)°

Crossref

Directory of Open Access Journals

PubMed Central

Complexity effective memory access scheduling for many-core accelerator architectures

Author: Ali Bakhoda
George L. Yuan
Tor M. Aamodt
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access lo-cality and bank-level parallelism, which in turn maximizes DRAM bandwidth. This is especially important in graphics processing unit (GPU) architectures, where the large quan-tity of parallelism places a heavy demand on the memory system. The logic needed for out-of-order scheduling can be expensive in terms of area, especially when compared to an in-order scheduling approach. In this paper, we propose a complexity-effective solution to DRAM request schedul-ing which recovers most of the performance loss incurred by a naive in-order first-in first-out (FIFO) DRAM scheduler compared to an aggressive out-of-order DRAM scheduler. We observe that the memory request stream from individual GPU“shader cores ” tends to have sufficient row access local-ity to maximize DRAM efficiency in most applications with-out significant reordering. However, the interconnection net-work across which memory requests are sent from the shader cores to the DRAM controller tends to finely interleave the numerous memory request streams in a way that destroys the row access locality of the resultant stream seen at the DRAM controller. To address this, we employ an intercon-nection network arbitration scheme that preserves the row access locality of individual memory request streams and, in doing so, achieves DRAM efficiency and system perfor-mance close to that achievable by using out-of-order mem-ory request scheduling while doing so with a simpler de-sign. We evaluate our interconnection network arbitration scheme using crossbar, mesh, and ring networks for a base-line architecture of 8 memory channels, each controlled by its own DRAM controller and 28 shader cores (224 ALUs), supporting up to 1,792 in-flight memory requests. Our re-sults show that our interconnect arbitration scheme coupled with a banked FIFO in-order scheduler obtains up to 91% of the performance obtainable with an out-of-order memory scheduler for a crossbar network with eight-entry DRAM controller queues

CiteSeerX

Crossref

Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead

Author: Alberto Ros
Bakhoda Ali
Blake
Erik Hagersten
Esteve Albert
Hechtman Blake A.
Hower Derek R.
Konstantinos Koukos
Li Sheng
Power Jason
Power Jason
Sarita
Seshadri Vivek
Singh Inderpreet
Stefanos Kaxiras
Szafaryn Lukasz G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Designing network-on-chips for throughput accelerators

Author: Bakhoda Ali
Publication venue: University of British Columbia Press
Publication date: 01/05/2014
Field of study

Physical limits of power usage for integrated circuits have steered the microprocessor industry towards parallel architectures in the past decade. Modern Graphics Processing Units (GPU) are a form of parallel processor that harness chip area more effectively compared to traditional single threaded architectures by favouring application throughput over latency. Modern GPUs can be used as throughput accelerators: accelerating massively parallel non-graphics applications. As the number of compute cores in throughput accelerators increases, so does the importance of efficient memory subsystem design. In this dissertation, we present system-level microarchitectural analysis and optimizations with an emphasis on the memory subsystem of throughput accelerators that employ Bulk-Synchronous-Parallel programming models such as CUDA and OpenCL. We model the whole throughput accelerator as a closed-loop system in order to capture the effects of complex interactions of microarchitectural components: we simulate components such as compute cores, on-chip network and memory controllers with cycle-level accuracy. For this purpose, the first version of GPGPU-Sim simulator that was capable of running unmodified applications by emulating NVIDIA's virtual instruction set was developed. We use this simulator to model and analyze several applications and explore various microarchitectural tradeoffs for throughput accelerators to better suit these applications. Based on our observations, we identify the Network-on-Chip (NoC) component of memory subsystem as our main optimization target and set out to design throughput effective NoCs for future throughput accelerators. We provide a new framework for NoC researchers to ensure the optimizations are "throughput effective", namely, parallel application-level performance improves per unit chip area. We then use this framework to guide the development of several optimizations. Accelerator workloads demand high off-chip memory bandwidth resulting in a many-to-few-to-many traffic pattern. Leveraging this observation, we reduce NoC area by proposing a checkerboard NoC which utilizes routers with limited connectivity. Additionally, we improve performance by increasing the terminal bandwidth of memory controller nodes to better handle frequent read-reply traffic. Furthermore, we propose a double checkerboard inverted NoC organization which maintains the benefits of these optimizations while having a simpler routing mechanism and smaller area and results in a 24.3% improvement in average application throughput per unit area.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Synthesis, Characterization, and Crystal Structures of Tris(2-pyridyl)phosphine Sulfide and Selenide

Author: Abolghasem Bakhoda
Ali Nemati Kharat
Alireza Abbasi
Sheldrick G. M.
Taraneh Hajiashrafi
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

HARP

Author: Ahmad Khonsari
Ahmad Lashgar
Amirali Baniasadi
Bakhoda Ali
Volkov Vasily
Wilson W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

iConn

Author: Bakhoda Ali
Becker Aaron
Brookwood Nathan
Lee Paul
Nilanjan Goswami
Tao Li
Vuduc Richard
Zhongqi Li
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Author: Asadinia Marjan
Bakhoda Ali
Blem Emily
Kuo Hsien-Kai
Smullen W.
Stratton John A.
Zhao Jishen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref